Effect of Data Imbalance on Unsupervised Domain Adaptation of Part-of-Speech Tagging and Pivot Selection Strategies

نویسندگان

  • Xia Cui
  • Frans Coenen
  • Danushka Bollegala
چکیده

Domain adaptation is the task of transforming a model trained using data from a source domain to a different target domain. In Unsupervised Domain Adaptation (UDA), we do not assume any labelled training data from the target domain. In this paper, we consider the problem of UDA in the contact of Part-of-Speech (POS). Specifically, we study the effect of data imbalance on UDA of POS, and compare different pivot selection strategies for accurately adapting a POS tagger trained using some source domain data to a target domain. We propose the use of F-score to select pivots using available labelled data in the source domain. Our experimental results on using benchmark dataset for cross-domain POS tagging, show that using frequency combined with Fscores for selecting pivots in the source labelled data produces the best results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Updating of Word Representations for Part-of-Speech Tagging

We propose online unsupervised domain adaptation (DA), which is performed incrementally as data comes in and is applicable when batch DA is not possible. In a part-of-speech (POS) tagging evaluation, we find that online unsupervised DA performs as well as batch DA.

متن کامل

Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning

Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...

متن کامل

Unsupervised Domain Adaptation for Joint Segmentation and POS-Tagging

Sophisticated models have been developed for joint word segmentation and part-of-speech tagging, with increasing accuracies reported on the Chinese Treebank data. These systems, which rely on supervised learning, typically perform worse on texts from a different domain, for which little annotation is available. We consider self-training and character clustering for domain adaptation. Both metho...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

برچسب‌گذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی

Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017